Take Home Exercise-5

Reveal the social areas of the city of Engagement, Ohio USA.

Shachi Anirudha Raodeo , true
2022-05-26

The Task

This take-home exercise aims to reveal reveal social areas and visualizing and analyzing locations with traffic bottleneck of the city of Engagement, Ohio USA by using visualization techniques in R.

Assuming the volunteers are representative of the city’s population, characterize the distinct areas of the city that you identify. For each area you identify, provide your rationale and supporting data.

Where are the busiest areas in Engagement? Are there traffic bottlenecks that should be addressed? Explain your rationale.

Links to the dataset:

TravelJournal.csv Employers.csv Pubs.csv Restaurants.csv Buildings.csv Schools.csv Apartments.csv

Step-by-step Data Visualisation

Installing and launching R packages

Packages, namely `tidyverse, sftime, ViSiElse, tmap are required for this exercise. This code chunk installs the required packages and loads them onto RStudio environment.

packages = c('tidyverse','ggplot2','ggdist', 'ggridges','patchwork', 'ggthemes','hrbrthemes','ggrepel','ggforce',"HH","vcd",'scales','grid','gridExtra','formattable','readr', 'ggiraph', 'plotly', 'DT', 'gganimate','readxl','gifski','gapminder','treemap','treemapify','rPackedBar','ggstatsplot','ggside','broom','crosstalk','ViSiElse','zoo', 'lubridate', 'remotes', 'trelliscopejs','data.table','sf','tmap','sf','clock','sftime','rmarkdown')

for (p in packages){
  if(!require(p, character.only = T)){
    install.packages(p)
  }
  library(p, character.only = T)
}

Data Preparation

Data Source

The dataset used in this exercise is Participants.csv, published by the IEEE for [VAST challenge 2022] (https://vast-challenge.github.io/2022/)

Importing the dataset

The code chunk below imports Buildings.csv , TravelJournal.csv, ParticipantStatusLogs10.csv, Schools.csv, Employers.csv, Pubs.csv, Restaurants.csv and Apartments.csv from the data folder into R by using read_csv() function of readr and saves it as Tibble data frame called buildings, travel, logs, schools, employers, pubs, Restaurants and apartments

The TravelJournal.csv contains information about participants’ motivation for movement around the city.

travel <- read_csv("data/TravelJournal.csv")
summary(travel)
schools <- read_sf("data/Schools.csv", 
                   options = "GEOM_POSSIBLE_NAMES=location")
pubs <- read_sf("data/Pubs.csv", 
                   options = "GEOM_POSSIBLE_NAMES=location")
restaurants <- read_sf("data/Restaurants.csv", 
                   options = "GEOM_POSSIBLE_NAMES=location")
buildings <- read_sf("data/Buildings.csv", 
                   options = "GEOM_POSSIBLE_NAMES=location")
apartments <- read_sf("data/Apartments.csv", 
                   options = "GEOM_POSSIBLE_NAMES=location")
employers <- read_sf("data/Employers.csv", 
                   options = "GEOM_POSSIBLE_NAMES=location")
logs <- read_sf("data/ParticipantStatusLogs10.csv", 
                options = "GEOM_POSSIBLE_NAMES=currentLocation")
glimpse(logs)

Data Wrangling

The Travel Journal contains travel data of a participant towards Work/Home Commute, Eating, Coming Back From Restaurant,Recreation (Social Gathering), Going Back to Home.

The other schools, buildings, restaurants, pubs, apartments and employers gives us location of distinct places in the city.

Participant logs gives us the routine log file of a participant over a certain time period.

Calculating Amount Spent

Calculating the total amount spent at the location as a difference of the starting balance and ending balance in the travel journal gives us the amount spent by the participant at a particular location.

travel$amountSpent <- travel$endingBalance -travel$startingBalance

Extract information from timestamp

We use weekdays(), day(), month(), year() functions to extract the day of the week, date, moth and year of checkin to perform time series visualizations.

Calculate the time spent at a particular place and the travel time

Calculate travel time as the difference between the travel start time and the travel end time and calculate the time spent as the difference of check in and check out times.

data_travel= travel%>%
  mutate(weekday = weekdays(checkInTime),
         day = day(checkInTime),
         month=as.character(checkInTime,"%b %y"),
         year = year(checkInTime),
         monthYear = floor_date(checkInTime, "month"),
         travelEndLocationId=as.character(travelEndLocationId),
         timeSpent = checkOutTime - checkInTime,
         travelTime = travelEndTime- travelStartTime,
         participantId=as.character(participantId),
         purpose=as.character(purpose))

data_travel$timeSpent <- as.numeric(as.character(data_travel$timeSpent))
data_travel$travelTime <- as.numeric(as.character(data_travel$travelTime))

Filter necessary columns

data_travel <- data_travel[,c("participantId","travelStartLocationId", "travelEndLocationId", "purpose", "checkInTime", "amountSpent","timeSpent","travelTime","weekday","day","month","year","monthYear")]

Save files as RDS

saveRDS ( data_travel, 'data/data_travel.rds')
saveRDS ( logs, 'data/logs.rds')
data_travel <- readRDS ( 'data/data_travel.rds')
head (data_travel)
# A tibble: 6 × 13
  participantId travelStartLocationId travelEndLocationId purpose     
  <chr>                         <dbl> <chr>               <chr>       
1 23                              532 894                 Recreation …
2 876                              NA 1804                Eating      
3 902                              NA 1801                Eating      
4 919                              NA 1802                Eating      
5 154                              NA 446                 Eating      
6 509                              NA 1801                Eating      
# … with 9 more variables: checkInTime <dttm>, amountSpent <dbl>,
#   timeSpent <dbl>, travelTime <dbl>, weekday <chr>, day <int>,
#   month <chr>, year <int>, monthYear <dttm>
logs <- readRDS ( 'data/logs.rds')
head(logs)
Simple feature collection with 6 features and 11 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: -3624.161 ymin: 2870.49 xmax: -1791.201 ymax: 5848.29
CRS:           NA
# A tibble: 6 × 12
  timestamp             currentLocation participantId currentMode
  <chr>                         <POINT> <chr>         <chr>      
1 2022-04-26T00:2… (-2459.039 2982.023) 314           AtHome     
2 2022-04-26T00:2… (-3624.161 5661.512) 315           AtHome     
3 2022-04-26T00:2…  (-1791.201 5848.29) 316           AtHome     
4 2022-04-26T00:2…  (-2384.221 2870.49) 317           AtHome     
5 2022-04-26T00:2… (-2459.754 3053.668) 318           AtHome     
6 2022-04-26T00:2… (-3339.739 5572.409) 319           AtHome     
# … with 8 more variables: hungerStatus <chr>, sleepStatus <chr>,
#   apartmentId <chr>, availableBalance <chr>, jobId <chr>,
#   financialStatus <chr>, dailyFoodBudget <chr>,
#   weeklyExtraBudget <chr>

Data Visualization

Buildings present in Ohio, USA

logs_path <- logs %>%
  mutate(Timestamp = date_time_parse(timestamp,
                                     zone= "",
                                     format = "%Y-%m-%dT%H:%M:%S"))%>%
  mutate(day=get_day(Timestamp))%>%
  filter(currentMode == "Transport")
tmap_mode("view")
tm_shape(buildings)+
tm_polygons(col = "grey60",
           size = 1,
           border.col = "black",
           border.lwd = 1)
tmap_mode("plot")

Location of Employers present in Ohio USA

We notice that employers are distributed throughout the city with a maximum number in the central part of the city which appears more connected.

tmap_mode("view")
tm_shape(buildings)+
tm_polygons(col = "grey60",
           size = 1,
           border.col = "black",
           border.lwd = 1) +
tm_shape(employers) +
  tm_dots(col = "blue")

Location of Schools present in Ohio USA

Since we have only one school present in the city of engagement Ohio, USA. we notice that two schools are present in the north east block of the city, one in th central east and one in the southern block.

tmap_mode("view")
tm_shape(buildings)+
tm_polygons(col = "grey60",
           size = 1,
           border.col = "black",
           border.lwd = 1) +
tm_shape(schools) +
  tm_dots(col = "yellow")

Distribution of Restaurants in Ohio, USA

It is found that the restaurants are more distributed towards the northen and central blovks of the city as compared to the southern block. The southern block only hosts three restaurants.

tmap_mode("view")
tm_shape(buildings)+
tm_polygons(col = "grey60",
           size = 1,
           border.col = "black",
           border.lwd = 1) +
tm_shape(restaurants) +
  tm_dots(col = "red")

Distibution of Apartments in Ohio USA

Housing locations/ apartments are located in clusters in certain blocks of the city.

tmap_mode("view")
tm_shape(buildings)+
tm_polygons(col = "grey60",
           size = 0.5,
           border.col = "black",
           border.lwd = 1) +
tm_shape(apartments) +
  tm_dots(col = "orange")
hex_buildings <- st_make_grid(buildings, 
                    cellsize=100, 
                    square=FALSE) %>%
  st_sf() %>%
  rowid_to_column('hex_id_buil')

points_in_hex_buil <- st_join(logs_path, 
                         hex_buildings, 
                         join=st_within) %>%
  st_set_geometry(NULL) %>%
  count(name='pointCount', hex_id_buil)
head(points_in_hex_buil)
# A tibble: 6 × 2
  hex_id_buil pointCount
        <int>      <int>
1         169         16
2         212         49
3         225         12
4         226         93
5         227         14
6         228         36
hex_combined_buil <- hex_buildings %>%
  left_join(points_in_hex_buil, 
            by = 'hex_id_buil') %>%
  replace(is.na(.), 0)

tm_shape(hex_combined_buil %>%
           filter(pointCount > 0))+
  tm_fill("pointCount",
          n = 8,
          style = "quantile") +
  tm_borders(alpha = 0.1)
hex_employers <- st_make_grid(employers, 
                    cellsize=100, 
                    square=FALSE) %>%
  st_sf() %>%
  rowid_to_column('hex_id_emp')

points_in_hex_emp <- st_join(logs_path, 
                         hex_employers, 
                         join=st_within) %>%
  st_set_geometry(NULL) %>%
  count(name='pointCount', hex_id_emp)
head(points_in_hex_emp)
# A tibble: 6 × 2
  hex_id_emp pointCount
       <int>      <int>
1         25         19
2         42         18
3         68         24
4         69         33
5         70         12
6         71         16
hex_combined_emp <- hex_employers %>%
  left_join(points_in_hex_emp, 
            by = 'hex_id_emp') %>%
  replace(is.na(.), 0)

tm_shape(hex_combined_emp %>%
           filter(pointCount > 0))+
  tm_fill("pointCount",
          n = 8,
          style = "quantile",colorNA = 'white', palette = "-magma") +
  tm_borders(alpha = 0.1)